Vocabulary Density Method for Customized Indexing of MEDLINE Journals
نویسندگان
چکیده
Automated indexing of MEDLINE citations remains a challenging problem due to the growing volume of citations and the over 27,000 MeSH indexing terms that can be assigned to them. This paper presents a corpus-based approach to improving indexing for specific journals. The Vocabulary Density approach takes into account frequencies of indexing terms previously assigned to a journal when recommending indexing terms for a new citation in that journal. After implementing the approach, we saw a 2.69 (4.44%) improvement in Precision. Introduction The successes in automatic indexing of MEDLINE citations using the NLM Medical Text Indexer (MTI) have led to First Line (MTIFL) indexing of several journals. MeSH terms automatically assigned by MTIFL provide the initial indexing to a MEDLINE citation which is then reviewed and completed by an indexer. As we expand the set of journals indexed via MTIFL, we continue seeking improvements to the MTI algorithms. Potential for improvement using journal-specific data was discussed by Tsoumakas et al. To explore whether customizing MTIFL indexing of a journal is worthwhile, we have implemented a simple Vocabulary Density approach for all journals indexed by MTI. Methods We used 3,401,111 citations involving 6,606 journals from the 2014 MEDLINE Baseline that have been indexed over the last five years (henceforth referred to as Corpus). For each MeSH Heading (MH) used by each journal we captured the number of its occurrences (NOM) and the Number of Articles in the journal (NOA). We then normalized the frequency of each MH in the journal, computing Factor = NOM / NOA. For example, the MH “Swiss 3T3 Cells” occurred four times in the 2,231 articles of the journal “Biochemical Society (Great Britain)” in the Corpus. The Factor for this MH is 0.001793 (4 / 2231). We applied the Vocabulary Density method to journals that had at least 80 citations in the Corpus and to MHs introduced at least a year ago. Given the Vocabulary Density information, MTI does not recommend MHs that are not used for the journal and automatically recommends MHs with a Factor > 0.74. For frequently occurring MHs, e.g., Female or Humans, the threshold for automatically recommending is 1 to reduce incorrect recommendations.
منابع مشابه
Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty
Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Tw...
متن کاملThe NLM Medical Text Indexer System for Indexing Biomedical Literature
In the face of a growing workload and dwindling resources, the US National Library of Medicine (NLM) created the Indexing Initiative project in the mid-1990s. This cross-library team’s mission is to explore indexing methodologies that can help ensure that MEDLINE and other NLM document collections maintain their quality and currency and thereby contribute to NLM’s mission of maintaining quality...
متن کاملThe Indexing Initiative A Report to the Board of Scientific Counselors of the Lister Hill National Center for Biomedical Communications
For more than 150 years, the National Library of Medicine has provided access to the biomedical journal literature through the analytical efforts of human indexers. Since 1966, access has been provided in the form of electronically searchable document surrogates consisting of bibliographic citations, descriptors assigned by indexers from the MeSH ® controlled vocabulary (MeSH, 1998) and, since ...
متن کاملA One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning
We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automa...
متن کاملAutomatically Controlled-Vocabulary Indexing for Text Retrieval
The IR society has made efforts in free-term indexing for a long time. By contrast, few efforts are made in controlled-vocabulary indexing. A new model for controlled-vocabulary indexing is proposed in this paper. This proposed model, TF×OSDF×CSIDF, distinguishes subjectspecific words from common words and domain-specific words in documents. 60,400 MEDLINE records are used as training data and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014